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Suppose the expectation E(F(X)) is to be estimated by the em- 
pirical averages of the values of F on independent and identically 
distributed samples {Xi}. A sampling rule called the "screened" 
estimator is introduced, and its performance is studied. When the 
mean E(U(X)) of a different function U is known, the estimates are 
"screened," in that we only consider those which correspond to times 
when the empirical average of the {U(Xi)} is sufficiently close to its 
known mean. As long as U dominates F appropriately, the screened 
estimates admit exponential error bounds, even when F(X) is heavy- 
tailed. The main results are several nonasymptotic, explicit exponen- 
tial bounds for the screened estimates. A geometric interpretation, in 
the spirit of Sanov's theorem, is given for the fact that the screened 
estimates always admit exponential error bounds, even if the stan- 
dard estimates do not. And when they do, the screened estimates' 
error probability has a significantly better exponent. This implies 
that screening can be interpreted as a variance reduction technique. 
Our main mathematical tools come from large deviations techniques. 
The results are illustrated by a detailed simulation example. 

1. Introduction. Suppose we wish to estimate the expectation, 

/oo 
x 3/A f(x)dx, 

based on n independent samples X\,X2, ... , X n drawn from some unknown 
density / on [1, oo). Suppose, also, we have reasons to suspect that / has a 
fairly heavy right tail, and assume that the only specific piece of information 
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we have available is the value of the mean of /, v := E(X) = xf(x)dx, 
perhaps also its variance. Because of the heavy right tail, it is natural to 
expect significant variability in the data {X/} as well as in the subsequent 
estimates of fi. For definiteness, assume that the unknown density is fix) = 
2 7/a ; for x > 1 [and f(x) = 0, otherwise], so that [i = 10/7 and v = 5/3. 

Consider the simplest (and most commonly used) estimator for //; for 
each k < n, let Sk denote the empirical average of the transformed samples 

1 k 

& : =fc£*i > l<k<n. 
i=i 

Although the law of large numbers guarantees that the sequence of estimates 
{Sk} is consistent and the central limit theorem implies that the rate of 
convergence is of order ra" 1 / 2 , a quick glance at the behavior of Sk for finite 
k reinforces the concern that the estimates are highly variable: The plots in 
Figure 1 clearly indicate that, up to k = n = 5000, the {Sk} are still quite 
far from having converged. 

Since / is heavy-tailed, this irregular behavior is hardly surprising: Indeed, 
as n grows, the error probability Pr{5 n > ji + e} decays like 

(1.1) Fr{S n >n + e}~ £lQ/ l n7/3 , n^oo, 

for any e > 0; see, for example [12]. Therefore, unlike with most classical 
exponential error bounds, here the error probability decays polynomially in 
the sample size n, and with a rather small power at that. 

This state of affairs is discouraging, but suppose we decide to use the 
additional information we have about /, namely that its mean v equals 
5/3, in order to "screen" the estimates {S^}. This can be done as follows: 
Together with the {Sk}, also compute the empirical averages {Tk} of the 
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Fig. 1. Two typical realizations of the estimates {Sk} for k = 100, 101, . . . , n — 5000. 
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Fig. 2. Four typical realizations of the estimates {Sk} for k — 100, 101, ... ,n = 5000. The 
"screened estimates" are plotted in bold, and they are simply the original Sk at times k 
when the corresponding empirical average Tk is within u = 0.005 of its mean v — 5/3. 

samples {Xi} themselves, 

k 

Tk = -j:Y.X l , l<k<n, 

i=l 

and only consider estimates Sk at times k when the corresponding average 
Tk is within a fixed threshold u > from its known mean. That is, only 
examine Sk if at that same time k, \Tk — v\ < u. 

This results in what we call in this paper the u screened estimator" of 
/i. Figure 2 illustrates its performance on four different realizations of the 
above experiment. 

More generally, assume X,Xi, X2, ■ ■ ■ are independent and identically dis- 
tributed (i.i.d.) random variables with unknown distribution, and we wish 
to estimate the expectation \x := E[F(X)] for a given function F :M. — > R, 
while we happen to know the value of the expectation v := E[U(X)] of a 
different function U :R— ► R. In this general setting, we introduce: 

The Screened Estimator. For each k > 1, together with the empirical averages 
{Sk} of the {F(Xi)} also compute the averages {Tk} of the {U(Xi)}, and only 
consider estimates Sk at times k when Tk is within a fixed threshold u > 
from its mean, that is, \Tk — v\ < u. 
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The intuition is simple. In cases when we suspect that the empirical dis- 
tribution Pfc of the samples {Xi :i < k} is likely to be far from the true 
underlying distribution P, we can check that the projection / U dPk = Tk 
of Pk along a function U is close to the projection J U dP = v of the true 
distribution P along U. Of course this does not guarantee that Pk ~ P or 
that Sk ~ n, but it does rule out instances k when it is certain that Pk differs 
significantly from P. 

More importantly, as we shall see next, it is often possible to obtain 
explicitly computable exponential error bounds for the screened estimator, 
even when the error probability of the standard estimates {Sk} decays at a 
polynomial rate. 

The purpose of this paper is twofold. First, we provide a theoretical ex- 
planation for the practical advantage of the screened estimator: We develop 
general conditions under which the error probability of the screened esti- 
mator decays exponentially, regardless of the tail of the distribution of the 
{F(Xi)}. The main assumption is that U dominates F from above, in that 
sup x [F(x) — f3U(x)] is finite for all (3 > 0, where the supremum is over all 
x in the support of X. Then we state and prove a number of explicit ex- 
ponential bounds for the error probability of the screened estimator, which 
are easily computable and readily applicable to specific problems where the 
only information we have about the unknown underlying distribution is the 
mean and perhaps also the variance of U(X) for a particular function U. 
To illustrate, we return to the example of estimating the expectation 
= E(X 3 ^ 4: ) with respect to an unknown density / on [l,oo), based on n 
i.i.d. samples X%, . . . ,X n drawn from /, and assuming that we only know 
the mean (and perhaps some higher moments) of X . In the above notation, 
this corresponds to F(x) = x 3 / 4 and U(x) = x. The proof of the following 
proposition is given at the end of Section 3. 

Proposition 1.1. (i) The error probability of the standard estimator 
{S n } decays to zero at a polynomial rate: If the density f is given by f(x) = 
for x>l, then for any e > 0, 

Pr{S n -n>e}~ £l0/ l n7/3 , n^oo. 

(ii) The error probability of the screened estimator decays to zero expo- 
nentially fast: If the only information we have about f is that its mean v 
equals 5/3, then we can conclude that for all e,u > there exists I(e,u) > 
such that 

Pr{S n - fM> e and\f n - || < u} < e~ nI{£ ^ for all n> 1. 
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(iii) If, in addition, we know that the variance of f equals 20/9, then an 
explicit exponential bound can be computed: For any e > and any < u < 

e 

20' 

Pr{£„ - ii > e and \f n - § | < u} < e -(0-005)xn £ 2 j or aU n > 1 

(iv) If we also know that the value of the covariance between X 3//4 and X 
under f is 20/21, then the following more accurate bound can be obtained: 
For any e > and any < u < ^ , 

(1.2) Pr{S n - fj, > e and \f n - || < u} < e~ (°- 0367 ) xne2 , 

for all n> 1 . 

As long as the mean of X is known, we can employ the screened estimator 
and be certain that it will have an exponentially small error probability, 
whereas the standard estimator's probability of error may decay at least as 
slowly as n -7 / 3 . If the variance of X is also known, then for the specific 
values in the simulation examples in Figure 2, with e = 0.2, u = 0.005 and 
n = 5000, part (iii) of the proposition gives 

Pr{S n - n > 0.2 and \ f n - || < 0.005} < 0.368. 

This is fairly weak, despite the fact that e = 0.2 is a rather moderate margin 
of error. But the error probability does decay exponentially, and with n = 
10,000 samples the corresponding upper bound is only ~ 0.136, while for 
n = 15,000 it is ~ 0.0498. And if, in addition, the value of the covariance 
between X 3 / 4 and X is available, then part (iv) gives a much more accurate 
result even for smaller e: Taking e = 0.1, u = 0.005 and n = 5000, 

Pr{SV t - n > 0.1 and \ f n - || < 0.005} < 0.1596, 

and for n = 10,000 samples the corresponding bound is ~ 0.025. 

Two points of caution are in order here. The first is perhaps somewhat 
subtle and has to do with the interpretation of the above error bounds. What 
exactly does (1.2) say? Is it the case that, at any time k when is within u 
of its mean, we can apply (1.2) to obtain a bound on the probability of error 
for the corresponding estimate S^? Strictly speaking, the answer is "no"; 
since the times at which the screening averages {Tfc} are close to their mean 
are random, (1.2) cannot be automatically invoked. A strict operational 
interpretation of the mathematical statement in (1.2) is as follows: First 
choose and fix an n such that (1.2) offers a satisfactory guarantee on the 
error probability; here n may be the total number of samples available, or 
it may be the number of samples we decide to generate from /. Then look 
at T n , and if \T n — v\ < u, it is legitimate to use the error bound (1.2) for 
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Fig. 3. Another realization of the empirical estimates {§k} for 
k — 100, 101, . . . ,n = 5000, plotted together with the screened estimates shown in 
bold (where u — 0.005 as before). The screened estimates at earlier times are less accurate 
than some of the later estimates that are ignored by the screened estimator. 

the value of the estimate S n at the last sample time n. Otherwise, do not 
use the bound (1.2) at all. 

The same interpretation applies to any application of the screened esti- 
mator. On the one hand, screening gives a powerful heuristic for selecting 
times k when the S& are more likely to be accurate, and it can be used as 
a diagnostic tool to actually rule out times k when it is certain that the 
empirical distribution of the samples is not close to the true underlying dis- 
tribution. On the other hand, in cases when it is required that the error 
probability be precisely quantified, the sampling times cannot be random 
and they have to be decided upon in advance. 

The second point is based on some results we observed in simulation 
experiments, indicating that the sampling times k picked out by the screened 
estimator are not all equally reliable: Naturally, since the probability of error 
decays exponentially, earlier times correspond to much looser error bounds, 
while the error probability of estimates obtained during later times can be 
more tightly controlled. This is illustrated by the (rather atypical but not 
impossibly rare) results shown in Figure 3. 

From the probabilistic point of view, the following calculation gives a quick 
explanation for the fact that the screened estimator leads to exponential 
error bounds in great generality (although this is not how the actual error 
bounds in Section 3 are obtained). Suppose the {Sk} are used to estimate the 
mean /j = E(F(X)) for some F, while we know v = E(U(X)) for a different 
function U that dominates F in that esssupj S( '[i ? (X) — f3U(X)\ < oo, for 
all (3 > 0. Although F(X) may be heavy-tailed, in which case the {S^} 
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themselves will not admit exponential error bounds, the error probability of 
the screened estimator is bounded by 

Pv{S n — fx > e and \T n — u\ < u} 
(1.3) < Prj ^Y}F(Xi) - pU(Xi)] -Q l -p v )>e- /3u j . 

Since E[F(X) — f3U(X)] = \l — [3v, for < [3 < f this is a large deviations 
probability for the right tail of the partial sums of the random variables 
{F(Xi) — (3U(Xi)}, which are (a.s.) bounded above. It is, therefore, no sur- 
prise that this probability is exponentially small. 

1.1. Screening and control variates. A well-known and commonly used 
technique for reducing the variance of an estimator in classical Monte Carlo 
simulation is the method of control variates; see, for example, the standard 
texts [7, 11, 13] or the paper [9] for extensive discussions. This method 
is based on the observation that in many applications — exactly as in our 
setting — there is a function U whose expectation v = E[U(X)] is known. 
Therefore, replacing the estimates for [i = E[F(X)] with the control 

variate estimates, 

k 

Sk:=^f2iF(Xi)-0[U(Xi)-u]), l<k<n, 

i=l 

yields an estimator which is still consistent (since the additional term has 
zero mean) but whose variance is different from that of {<Sfc}. In fact, choos- 
ing (or estimating) the value of the constant [3 appropriately always leads 
to an estimator with strictly reduced variance, as long as F(X) and U (X) 
are correlated random variables. 

This technique is widely employed in practice; see the references above as 
well as [1, 6]; also the text [8] contains many examples of current interest in 
computational finance and pointers to the relevant literature. In particular, 
functions U that appear in applications as control variates provide a natural 
class of screening functions that can be incorporated in the design on the 
screened estimator. 

An interesting connection between these two methods (control variates 
and screening) is seen in that the second probability in (1.3) above is exactly 
the error probability for the control variate estimates {S*fc}. More generally, 
in cases where control variates (or some other method) are used to reduce 
the variance of the {S^}, we view screening as a sampling rule which com- 
plements (and does not replace) variance reduction or variance estimation 
techniques. The connection between screening and variance reduction is an 
intriguing one, and will be explored in subsequent work [10]. 
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1.2. Outline and summary of results. The general results in Sections 2 
and 3 parallel those presented for the example in Proposition 1.1. Theo- 
rems 2.1, 2.2 and 2.3 offer a theoretical description of the large deviations 
behavior of the screened estimator's error probability, both asymptotically 
and for finite n. The only assumptions necessary are that E(F{X)) is finite, 
and that the mean E(U(X)) is known for some function U which dominates 
F in that esssup[i ? (X) — (3U(X)] < oo for all (3 > 0. Then the error probabil- 
ity admits a nontrivial exponential bound, regardless of the distribution of 
X. The exponent can be expressed either as a Fenchel-Legendre transform 
or in terms of relative entropy, and the relative entropy formulation leads to 
an elegant geometric explanation for the fact that the screened estimator's 
error probability always decays exponentially. 

When F(X) and U(X) also have finite second moments, and assuming 
that the variance Var(U(X)) is known, in Theorem 3.1 we give an explicit, 
easily computable, exponential bound for the error probability. The bound 
holds for all n > 1, and the exponent is of order e 2 for small e,u. Also, a 
more refined bound is given when the value of the covariance between F{X) 
and U (X) is available. These are the main results of this paper. 

In Section 4 we consider the case when F(X) and U(X) have finite expo- 
nential moments, so that the standard estimator {Sk} already has an expo- 
nentially vanishing error probability. Theorem 4.1 shows that the screened 
estimator's error probability decays at a strictly faster exponential rate, and 
the difference between the exponents is more precisely quantified in Theo- 
rem 4.2: It is shown to be of order e 2 for small e, u, and this is used to draw 
a different heuristic connection between screening and variance reduction 
techniques. 

Section 5 contains the proofs of Theorems 2.1, 2.2 and 2.3. 

Finally, we mention that the screening idea can also be applied in the 
context of more complex problems arising in Markov chain Monte Carlo 
(MCMC) simulation. Such generalizations are by no means immediate, and 
they will be explored in subsequent work. 

2. Large deviations. In this section we give a theoretical explanation for 
the (sometimes dramatic) performance improvement offered by the screened 
estimator. For explicit bounds like those presented in the Introduction, see 
Section 3. 

Let X, Xi,X2, ■ ■ ■ be i.i.d. random variables with common law given by 
the probability measure P on R. Given a function F : R — > R whose mean is 
to be estimated by the empirical averages of the {F(Xi)}, for the purposes of 
this section only we consider a slightly simplified version of the screened es- 
timator: Assuming the mean v = E(U (X)) of a different function U : R — > R 
is known, we examine the screened estimator based on the one-sided screen- 
ing event, {^27=1 U{X{) — nv < nu}, for some u > 0. To avoid cumbersome 
notation, write S n := f^=i F{Xi) and T n := Ya=i U(Xi), n > 1. 
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In the first result, Theorem 2.1 below, we obtain representations for the 
asymptotic exponents of the error probability, both for the standard estima- 
tor and for the screened estimator. The exponents are expressed in terms of 
relative entropy, in the spirit of Sanov's theorem; see [3, 4, 14]. Recall that 
the relative entropy between two probability measures P and Q on the same 
space is defined by 

( f dP dP . 

H(P\\Q):=\J dPl0S dQ' W 6n dQ 6X1StS ' 
[ oo, otherwise. 

Theorem 2.1 follows from the more general results in Theorems 2.2 and 2.3 
below; its proof is given in Section 5. 

Theorem 2.1 (Sanov asymptotics). Suppose the functions F : M — > [0, oo) 
and U :K — >R have finite first moments [i := E[F(X)\, v := E[U(X)], and 
also finite second moments, E[F(X) 2 ], E[U(X) 2 ]. Assume that F{X) is 
heavy-tailed in that E[e 9F ( x ^] = oo for all 9 > 0, and that U dominates F in 
that m((3) := esssup[F(X) - (3U(X)] < oo for all (3>0. Then: 

(i) The error probability of the standard estimator decays subexponen- 
tially: For all £ > 0, 

lim - logPr{5 n - na > ne\ = - inf H(Q\\P) = 0, 

where S is the set of all probability measures Q onW such that f FdQ — \x > 
e. 

(ii) The error probability of the screened estimator decays exponentially: 
For all e, u > 0, 

lim — logPr{S* n — ran > ne and T n — nv< nu\ = — inf H(Q\\P) < 0, 

n^oo n QeE 

where E C S is the set of all probability measures Q on~EL such that f F dQ — 
fi> e and f U dQ — u < u. 

Therefore, while the (asymptotic) exponent of the error probability of the 
standard estimator is equal to zero, the exponent of the error probability 
of the screened estimator is strictly positive. Although this situation is only 
possible when the relative entropy is minimized over an infinite-dimensional 
space of measures [in that the exponent infQgs H(Q\\P) cannot be zero 
when X takes only finitely many values], it is perhaps illuminating to offer 
a geometric description. 

The large oval in the first diagram in Figure 4 depicts the space of all 
probability measures Q on I, and the small "cap" on the left is the set E 
of those Q with / FdQ — fx > e. The gray shaded area corresponds to the 



10 



I. KONTOYIANNIS AND S. P. MEYN 



"smallest" subset of £ such that the infimum of H(Q\\P) over this subset 
is zero. (Of course this set is not exactly well defined, but it does con- 
vey the correct intuition.) In the second diagram, the black shaded area 
corresponds to set E, formed by the intersection of £ with the half-space 
H = {Q : / UdQ — v < u}. Note that H is a "typical" set under P, in that 
P G H and the empirical measure of the {X{\ will eventually concentrate 
there by the law of large numbers. Nevertheless, when £ is intersected with 
H to give E, Theorem 2.1 tells us that it excludes the part of £ which is 
close to P in relative entropy (the gray area), and this forces the result of 
the minimization over Q £ E to be strictly positive; the limiting minimizer 
Q* , assuming it exists, is shown as laying on the common boundary of £ 
and H. 

The following two theorems give a more precise and complete description 
of the large deviations properties of the probabilities of interest. Formally, 
they simply establish a version of Cramer's theorem in the present setting. 
What is perhaps somewhat surprising is that this is done without any as- 
sumption of finite exponential moments. In the presence of the domination 
condition m((3) < oo, it turns out that it is only necessary to assume finite 
first (and in some cases second) moments for F(X) and U(X). 

The results in Theorems 2.2 and 2.3 will form the basis for the devel- 
opment of the bounds in Section 3. Their proofs are given in Section 5. 



Theorem 2.2 (Exponential upper bounds). Suppose the functions F : R - 
R and U : R -> R are such that fx := E[F(X)\ and v := E[U(X)] are both fi- 
nite, and that m((3) := esssup[F(X) — f3U(X)] < oo for all (5 > 0. Then for 
all e,u> 0: 

(i) Pr{5 n — nfj, > ne,T n — nv < nu} < exp{— nH(E\\P)}, for all n>l, 
where, 

(2.1) H(E\\P):=M{H(Q\\P):QeE}, 




Fig. 4. Geometric illustration of the fact that infQgE H(Q\\P) = whereas 
infggE H(Q\\P) is strictly positive. 
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and E is the set of all probability measures Q onM such that J FdQ — u > e 
and J U dQ — v < u. 

(ii) Pr{,S n — nu > ne,T n — nv < nu} < exp{— nA+(e, it)}, for all n > 1, 
where, 

A* + (e,u):= sup {0i(/i + e) - 9 2 {y + u) - A + (0i, 6 2 )}, 

6»i,6» 2 >0 

with A+(0i,0 2 ) := log E[exp{6!F(X) - 6 2 U(X)}], 9 U 9 2 >0. 

(iii) The rate function A+(e,u) is strictly positive. 

Theorem 2.3 (Large Deviations Asymptotics). Under the assumptions 
of Theorem 2.2, if, in addition, F{X) and U(X) have finite second moments, 
then for all e, u > 0, 

(2.2) lim — logPr{5 n — nu > ne,T n — nu < nu} = — A*_(e, u), 

n — >oo ji 

and A+(e,u) coincides with the rate function H(E\\P) given in (2.1). 

3. Bounds for arbitrary tails. Let X,Xi,X 2 , ■ ■ ■ be i.i.d. random vari- 
ables. Given functions F, U : M -» R, write S n = E"=i F(Xi) and T n = Yn=i U(Xi). 
We begin by restating part of Theorem 2.2. Since the two-sided error event 
{S n — nfi> ne, \T n — nv\ < nu} is contained in {S n — nu> ne, T n — nu< nu}, 
we have: 

Corollary 3.1. Suppose the functions F:E^M and C/iIR^M are 
such that u := E[F(X)] and v := E[U(X)] are both finite, and that m(f3) := 
ess sup[F(X) — PU(X)] < oo for all (3 > 0. Then for alln>l and all e, u > 0, 

Pr{S n -nfi> ne, \T n - nv\ < nu} < e ~ nA +^ , 

where the exponent, A5_ (s,u), is given by 

sup {0i (jm + e)-e 2 {v + u)- log,B[exp{0 1J F(X) - 9 2 U(X)}}}, 

6»i>0,6» 2 >0 

and is strictly positive. 

Remarks. 1. An exactly analogous result holds if instead of m{(3) we 
assume that esssup[F(A") + (3U(X)] < oo for all (3 > 0. Then, repeating the 
Chernoff argument in the proof of Theorem 2.2 for the one-sided error event 
{S n — nu > e, T n — nv > —nu} leads to the same bound, but with the expo- 
nent, T* + (e,u), given by 

sup {0i (u + e)+e 2 {v-u)- log£[exp{0iF(X) + 9 2 U(X)}}}, 

01>O,0 2 >O 

and r^_(e,u) can be similarly seen to be strictly positive. 
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2. Replacing F by — F yields a corresponding result for the left tail. If 
essinf [F(X) + @U{X)] > -oo for all > 0, 

Pr{5„ - nn < -ne, \T n - nv\ < nu} < eT^- 

where A*_ (e, u) is given by 

sup {0\{-u + e) + 6 2 {-v - u) 

6>i>0,6» 2 >0 

- lo gj B[ex P {-^F(X) - e 2 U(X)}}}, 

and is strictly positive. Moreover, in view of the previous remark, an analo- 
gous bound holds under the assumption that essinf[i ? (X) — j3U{X)\ > — oo 
for all (5 > 0; in this case the exponent is replaced by 

rl(e,ti)= sup (0i(-/i + e) + 9 2 (v- u) 

6»i>0,6»2>0 

- lo gj B[e X p{-^F(X) + 9 2 U(X)}}}, 

which is also strictly positive. 

3. Combining the observations in Remarks 1 and 2 immediately yields 
a bound on the two-sided deviations of {S n }. If both fi = E[F(X)] and 
v = E\U(X)} are finite, and also both esssup[F(X) — /3U(X)] < oo and 
essinf[F(X) + (3U(X)] > -oo, for all > 0, then for all n > 1 and all 

E,U > 0, 

Pr{|S* n — nfi\ > ne, \T n — nv\ < nu} 

(3.1) 

< e ~ nA *+ _|_ e ~ nA - < 2 e _nmm 'f A +' A -^, 

where and Al are strictly positive. Although this double domination 
assumption may appear severe, it is generally quite easy to find functions 
U that will dominate a given F appropriately. For example, if F(x) = x we 
can simply take U(x) = x 2 , or, more generally, U(x) = x 2k for any positive 
integer k, assuming appropriately high moments exist. 

4. In Remarks 1 and 2, two different domination assumptions were shown 
to give a bound on the right tail of the partial sums of F, and two more 
assumptions do the same for the left tail. Any of their four different combi- 
nations gives a bound similar to (3.1), with the appropriate combination of 
exponents. 

If F and U also have finite second moments, an easily applicable, quan- 
titative version of Corollary 3.1 can be obtained. The gist of the argument 
is the use of the boundedness of [F(X) — j3U{X)\ in order to compute an 
explicit lower bound for the exponent K* + (e,u). 
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Theorem 3.1. Suppose E[F(X)] = E[U{X)] = 0, Vax(F(X)) < 1, 
V&r(U(X)) = 1, and that m(j3) := esssup[F(X) - 0U(X)] < oo for all (3>0. 
Then the following hold for all n > 1 : 

(i) For any e, u > 0, if there exists (3 > such that m(/3) < e — flu, then 

Pr{5 n > ne, \T n \ < nu} = 0. 

(ii) For any e, u > 0, 
logPr{5 n > ne, \T n \ < nu} 

m ■ (1 — a) 



(3.2 



< — 2n sup 

«e(o,i) 



m 2 + 1 + (ae/u) 2 — 2a^e/u 



where m :=m(— ) and 7 := E[F(X)U(X)] is the covariance between F(X) 
and U(X). 

(iii) Let K > be arbitrary. Then for any e > and any < u < Ke, 

1 2 



(3.3) \ogPr{S n >ne,\T n \ <nu} < 
where M = m(Jg). 



n 



M 



M 2 + (1 + 1/(2K)) 2 



s 2 < 



Remarks. 1. The assumption that Var(i ? (X)) < 1 in Theorem 3.1 seems 
to require that we know an upper bound on the variance of F in advance, 
but in practice this is easily circumvented. In specific applications, we typ- 
ically have a function U that dominates F in that, not only m((3) < 00 for 
all (3 > 0, but also there are finite constants C\, C2 such that 

(3.4) \F{x)\ < dU(x) + C 2 for all x in the support of X. 

This is certainly the case for the example presented in the Introduction, as 
well as in the examples in Remark 3 above. A bound on the variance of 
F(X) is obtained from (3.4), Var(Fpf)) < C\ Var(C/(X)) +Cf. This and 
several other issues arising in the application on Theorem 3.1 are illustrated 
in detail in the proof of Proposition 1.1. 

2. As will become clear from its proof, to use the bounds in Theorem 3.1 it 
is not necessary to know m(/?) exactly; any upper bound on the esssup[F(JT) - 
PU(X)] can be used in place of m((3). 

3. When F(x) = F{x) — /1, where a is the unknown mean to be estimated, 
it is hard to imagine that the exact value of the covariance 7 may be known 
without knowing a. But, similarly to m(f3), in order to apply (3.2) it suffices 
to have an upper bound on 7, and such estimates are often easy to obtain. 
See the proof of Proposition 1.1 for an illustration. 

4. The main difference between the bounds in (3.2) and (3.3) is that (3.3) 
only requires knowledge of the first and second moments of U(X), whereas 
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(3.2) also depends on 7. The bound in (3.3) is attractive because it is simple 
and it clearly shows that the exponent is of order e 2 for small e. Its main 
disadvantage is that it often leads to rather conservative estimates, since 
it ignores the potential correlation between F(X) and U{X) and it follows 
from (3.2) by an arbitrary choice for the parameter a. The exponent in (3.2), 
on the other hand, despite its perhaps somewhat daunting appearance, is 
often easy to estimate and it typically gives significantly better results. This 
too is clearly illustrated by the results (and the proof) of Proposition 1.1. 

5. Considering — F in place of F gives corresponding bounds for the lower 
tail of the partial sums S n , under the assumption that essinf[i ? (X) +(3U(X)] 
be finite for all (3 > 0. As in (3.1), these can be combined with the corre- 
sponding results in Theorem 3.1 to give explicit exponential bounds for the 
two-sided deviation event, {\S n \ > e, \T n \ < u}. 

Proof of Theorem 3.1. As already noted in (1.3) in the Introduction, 

for any (3 > 0, 

Vx{S n >ne,\T n \<u}<Vr^f^[F{X i )-pU{X i )]>e-^. 

If the essential supremum m{(3) of the random variables [F(Xi) — (5U{Xi)\ 
which are being averaged is smaller than the threshold e — (3u, then the 
above event is empty and its probability is zero, establishing (i). 

Recall the definitions of A + and A+ in Theorem 2.2. With any a € (0, 1), 
taking 62 = ae9\/u in the definition of A+(e,u), Corollary 3.1 yields 

(3.5) Pt{S u > ne, \T n \ < nu} < expl -nsup[0(l - a)e - A o (0)] }, 

where A o (0) := A + (0,^). Write s 2 := Var(F(X)) < 1, define the random 
variable Y := F(X) - &U(X), and note that Y <m:= m(2£) a.s., E(Y) = 
0, and 



Var(y) = s 



•■> 



u ) u ~ v u J u 



Throughout the rest of the proof we assume, without loss of generality, that 
m > 0. [We know m > by our assumptions, so if m = 0, then Kq{0) = and 
the supremum in (3.5) equals +00, implying that the probability of interest 
equals zero and that all the bounds stated in the theorem are trivially valid.] 
Now we apply Bennett's lemma [4], Lemma 2.4.1, to get an upper bound 
on Aq(0) as 



9 9 1 9 

r z m z _|_ a z 
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Using this and replacing 9 by X/m, the supremum in (3.5) is bounded below 
by 



I(e, a, u) := sup 

A>0 



Ax — log 



e -Xr* +T 2 e X 



where 



X := {1 ~ a)£ and r 2 := ^ 



1+r 2 

1 + ((ae)/u) 2 - (2aej)/u 



We consider the following cases: 

(i) If there exists a £ (0, 1) for which (1 — a)s > m(— ), then with (3 = ae/u 
we have m{(3) < e — (3u, which we already showed implies that the probability 
of interest is zero. 

(ii) In view of (i), we assume without loss of generality that (1 — a)e < 
m(^), for all a£ (0,1). For any a£ (0,1), in the definition of I(e,a,u) we 
may pick 



A 



1 



1+r 2 

which, after some algebra, yields 



log 



T 2 +X 
T 2 (1-X)J' 



I(e,a,u) > H 



x + T A 



.1 + r 2 

where H(y\\z) := ylog| + (1 — y)logj5| denotes the relative entropy be- 
tween the Bernoulli(y) and the Bernoulli(,z) distributions. This relative en- 
tropy is, in turn, by a standard argument (e.g., using Pinsker's inequality; 
cf. [2], Theorem 4.1), bounded below by rf^sYa • Therefore, 



1 



logPr{5 n > ne, \T n \ < nu} 



< 



sup 

ae(0,l) 



sup 

«£(0,1) 



2x 2 



(1 + r 



2\2 



2(1- aYe 



2J2 



m 2 [l + (1 + ((ae)/u) 2 - (2aje)/u)/m 2 } 



proving part (ii). 

(iii) Start by taking u = Ke. Noting that I7I < s < 1, 



l+(- 



2jae 



, as 
<l+( — 

u 



+ 



2ae 



1 + ^ 
K 



This and part (ii) show that the exponent of interest is bounded below by 

m • (1 — a) 



2 sup 

«e(o,i) 



m 2 + (l + a/K) 2 



16 



I. KONTOYIANNIS AND S. P. MEYN 



where m = m(^) = m(^). Picking a = 1/2, this is further bounded below 
by 



M 



M 2 + (l + l/(2ZO) 2 



where M = m(^), giving the required result in the case u = Ke. Since the 
probability in (3.3) can be no bigger for smaller values of u, the same bound 
holds for all < u < Ke. □ 

We are now in a position to illustrate how the results of Proposition 1.1 
stated in the introduction can be derived from Theorem 3.1. 



Proof of Proposition 1.1. Part (i) is already stated in (1.1), and 
part (ii) is immediate from Corollary 3.1. For parts (hi) and (iv) we will 
use the bound in Theorem 3.1(h). To that end, we begin by defining two 
functions F, U appropriately. 

Recall that, for (hi), we only have the following information: X is sup- 
ported on [l,oo), E(X) = 5/3 and Var(X) = 20/9. Then we can define 

. 3x y/E 
U(x):= — ;=-— , x>l, 

so that E(U(X)) = and Var(U(X)) = 1. Noting that fi > 1 and that 
E[(X 3 / 4 ) 2 ] < E(X 2 ) = Var(X) + E(X) 2 = 5 implies that VarpT 3 / 4 ) < 5 - 
1 = 4. Therefore, letting 



F(x) :=(x 3 / 4 -/i)/2, 



x > 1, 



we have E(F(X)) = and Var(F(X)) < 1. Using again the fact that fi > 1, 
we obtain an upper bound on m((3) as 



m([3) < sup 



X>1 



,3/4 



3/3x | 



2^/5 



This is a particularly easy maximization for (3 > ^ , in which case the max- 
imum is achieved at x = 1, giving 

V5 



(3.6) 



m(P)<m(/3) :-- 



V5 



for 0> 



We can now apply (3.2). Let S n and T n be as in Theorem 3.1, and let S n 
and T n be as in the proposition. For arbitrary e > and u = Jj, (3.2) gives 



- ilogPr<{ S n -pL>e, 
n 



< u 
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(3.7) 



^logPr{S" n >ne/2,\T n \ < 3ntt/2\/5} 

(1 -a)m((20y/Ea)/3) 



n 



1 

z ae(o,i) 



m((20\/5a)/3) 2 + 1 + ((20\/5a)/3) 2 - (40\/5a7)/3 



where 7 is the (yet unknown) covariance between F(X) and U(X). Restrict- 
ing to a > 3/80, using (3.6) and noting that I7I < 1, the above exponent is 
further bounded below by, 

i 2 



sup 

3/80<a<l 



20a(l-a)/3 



((20a)/3) 2 + (1 + (20 v / 5a)/3) 2 



e 2 > 0.005e 2 , 



where the last inequality follows by taking a = 0.0552083 in the above min- 
imization (this a was selected by plotting the graph of the expression to 
be maximized and picking a to give a value near the maximum). This 
proves (hi) for u = e/20, but, since the probability of interest is nonde- 
creasing in u, the same bound holds for any < u < e/20. 

For part (iv), assuming that we also know that Cov(X^^,X) = 20/21, we 
can calculate 



1 :=Cov(F(X),U(X)) 



JL C Ov(*3/^>^. 



From the bound in (3.7), restricting as before to a > 3/80, using (3.6) and 
substituting the value of 7, gives 

5 

~ 3 

20a(l-a)/3 



logPr<^ S n - n>e, 



< u 



n 
> - 
~ 2 



sup 



3/80<a<l 

> 0.0367ne 2 , 



(2400a 2 )/9 -(200a)/21 + l 



where the last inequality follows from choosing a = 0.0568. This proves (iv) 
for u = e/20, and, as before, the same bound remains valid for any < u < 
e/20. □ 

4. Bounds for light tails. As before, let S n , T n denote the partial sums of 
{F(Xi)}, {U(Xi)}, respectively, with respect to the i.i.d. random variables 
X,Xi,X 2 ,..., with common law P. We assume that E(F(X)) =E(U(X)) = 
0, and throughout this section we also assume that F and U have finite 
exponential moments, that is, 

A(0) :=log£[e eF W] < 00, 

and E[e eu ^} < 00, for all 9 € R. 
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From Corollary 3.1 and the subsequent discussion, we know that the 
screened estimator always admits exponential error bounds, 

(4.1) \ogPr{S n > ne,\T n \ <nu} < -nmax{A* + (e,u),T* + (e,u)}, n > 1, 

for all e, u > 0, where the exponents A^_ and given in Corollary 3.1 and 
Remark 1 after Corollary 3.1, respectively, are strictly positive. But in this 
setting, the standard estimates {^S n } also admit exponential error bounds; 
Cramer's theorem states that 

(4.2) logPr{5„ > ne} < -nA*(e), n > 1, 
where 

A*(e) :=sup{0e- A(0)} > 0, 

6»>0 

for any e > 0; see [4]. Recall that the exponents in both (4.1) and (4.2) are 
asymptotically tight. 

In this section we develop conditions under which the screened estimator 
offers a nontrivial improvement. That is, even when the error of the standard 
estimator decays exponentially, the error of the screened estimator has a 
better rate in the exponent. To that end, we look at difference 

A(e,u) := ma,x{A* + (e, u),T* + (e,u)} - A*(e). 

Clearly A(e,u) is always nonnegative. Theorem 4.1 says that, as long as the 
covariance between F(X) and U(X) is nonzero, A(e,u) is strictly positive 
for all e, u small enough. This is strengthened in Theorem 4.2, where it is 
shown that this improvement is a "first-order effect," in that, for small e, u, 
A(e,u) and max{A^_(e, u), Pj_(e, u)} are each of order e 2 . 

This leads to a different interpretation of the advantage offered by the 
screened estimator. Suppose that, for small e, u, A*(e) as ce 2 , and that 
max{A^(e, u), T+(e, u)} w (c + c')e 2 , for some c,c' > 0. Then for large n, 
the error of the standard estimator is 

Pr{S n >ne}^e-" C£2 , 

whereas for the screened estimator, 

Fx{S n > ne, \T n \ <u}^ e -n(c+c')e\ 

In both cases, we have approximately Gaussian tails. Therefore, roughly 
speaking, we may interpret the result of Theorem 4.2 as saying that, as 
long as the covariance between F(X) and U(X) is nonzero, the screened 
estimates are asymptotically Gaussian with a strictly smaller variance than 
the standard estimates. 
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Theorem 4.1. Suppose that E[F(X)] = E[U(X)] = and that 7 := 
Cov(F(X),U(X)) is nonzero. There exists £0 > such that, for each < 
e < Eq, there exists uq = uq(e) > such that A(e, u) > for all u G (0,u ). 

Note that the assumption on the covariance being nonzero cannot be 
relaxed. For example, let = Y^Zi, i > 1, where {5^} are i.i.d. nonnegative 
random variables, and {Z{\ are i.i.d., independent of the {Yi}, with each Z\ = 
±1 with probability 1/2. With F(x) = \x\ — E\Xi\ and U(X) = sign(x), we 
have F(Xi) = Yi — E(Y) and U (Xi) = Zi, so that S n and T n are independent 
for all n > 1. Therefore, 

Pr{5 n > ne, |T n | < nu} = PrjS'n > ne}Pr{|T n | < nu}, 

and since lim n Pr{|T n | < nu} = 1, the exponents of the other two probabili- 
ties must be identical. 

Whenever 7 is nonzero, the variances o~ 2 (F), cr 2 (U) of F(X) and U(X), 
respectively, are both nonzero. If A(e,u) denotes the corresponding differ- 
ence of exponents for the normalized functions F/a{F) and U/a(U), then 
from the definitions, 

A(e,u) = A 



Therefore, in order to determine the nature of this difference for small e we 
can assume, without loss of generality, that X&r{F(X)) = V&r(U(X)) = 1. 

Theorem 4.2. Suppose that E[F(X)] = E[U(X)} = 0, Var(F(X)) = 
V&r(U(X)) = 1, and that 7 := Cov(F(X), U(X)) is nonzero. Then there ex- 
ists a > such that 

liminf 4rA(e, as) > 0. 

e^O e 2 

In fact, there exists Eq > such that 



for all e G (0,£q). 



Before giving the proofs of the theorems, we collect some technical facts 
in the following lemma. 

Lemma 4.1. Suppose that E[F(X)] = E[U(X)\ = and that 7 := 
Cov(F(X),U(X)) is nonzero. Then: 

(i) A is smooth on R, A(0) = 0, A'(0) = 0, lim^oo A'(0) = F := 
esssupF(AT), A"(0) = Var(F(X)) > and A" (6) > for all 9 £ R. 
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(ii) For each < e < F there exists a unique 9* = 9*(e) > such that 
A'(9*) = e and A*(e) = 9*e — A(9*), where 9* = 0*(e) is strictly increasing 
ine£ (0,F). 

(iii) Suppose Var(F(X)) = 1. Let 5 > be arbitrary (but fixed). Then for 
any rj > there exists e > such that 

A(5e)>\{l-n)5 2 e 2 foralle<e. 

(iv) Suppose V&r(F(X)) = Var(U(X)) = 1. For arbitrary (but fixed) (5 > 
0, and for all t,e > 0, define ft(s) '■= A + (te, (3e). Then for any n > there 
exist t, e > such that 

ft(e) < 1(1 + (3 2 - 2{3~f + n)e 2 foralle<e, |t-l|<r. 

Proof. The statements in (i) and (ii) are well known; see, for example, 
[4]. In particular, it is a standard exercise to apply the dominated conver- 
gence theorem in order to justify all the required differentiations, as well as 
all the continuity statements and differentiations in the rest of this proof 
and in the proofs of Theorems 4.1 and 4.2. For (iii), given n > 0, since A" (9) 
is continuous and A"(0) = 1, we can choose e' > such that A"(e) > 1 — rj 
for e < e' . The result follows upon expanding A in a Taylor series around 
zero and recalling that A(0) = A'(0) = 0, with e = s'/S. 

Part (iv) is similar. Let r\ > be given. We have ft(0) = A+(0, 0) = 0, 
/ t '(0) = E[tF(X) - PU{X)] = and fl'(e) is jointly continuous in t,e > 
with 

fl'(0) = Var(tF(X) - 0U(X)) =t 2 + f3 2 - 2t(3 7 , 

where the prime (') now denotes differentiation with respect to e. Continuity 
at the point (i, e) = (1,0) implies that we can find r, e > such that 

ft{e) < f'l (0) + n = 1 + 1 - 2/3 7 + n for all e < e, \t-l\< r. 

For any t in that range, expanding ft(s) in a three-term Taylor series around 
e = gives the required result. □ 

Proof of Theorem 4.1. From the definitions, it follows that 
(4.3) A(e, u) > A* + (e, u) - A* (e) > sup[-0« - A+ (9* , 9) + A{9* )] . 

6»>0 

The expression inside the last supremum is zero for 9 = 0, and our goal is to 
show that it is strictly positive for small 9. To that end, define the function 

g{9):=E[F{X)U{X)e eF{x) l 9>0, 

and note that it is continuous in 9, and g(0) = 7. Choose #0 > so that 
9{0)H > V 2 for all < 9 < O . Let e = A'(9 ) > 0, and choose and fix an 
arbitrary < e < e , so that 9* = 9*(e) G (0, 9 ). 
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First consider the case 7 > 0. Define 

h(9) :=9*e-8u- A+(0*,0). 
Then h(0) = A*(e), and as in (4.3), 

A(e,u) > A*+(e,u) - A*(e) 



(4.4) > 



sup h(9) 

6»>0 



A*(e) 



>/»(0)-A*(e) = 0. 

In order to establish that A(e, it) > it suffices to show that h'(0) > 0. 
Computing the derivative of h yields 

h'(0) = e- A ^E[U(X)e e * F W}-u, 

and expanding the exponential inside the expectation in a two-term Taylor 
expansion, 

h'(0) = 9*e- A{e * ) E{F(X)U(X)e §F{x) ] - u, 
where 9 = 9{X) G (0,0*). Therefore, 

h'(0) > 9*e~ A ^ inf g(9) - u > 9* e - A( - e *^/2 - u, 

which is strictly positive, as long as 

u < u = u (e) := 9*{e)e- A ^^\ 1 \/2. 

The case 7 < is similar, with T*. in place of h* + : Replace h by h(9) = 
9*e -9u- log£[exp{0*F(X) + 6U(X)}], so that h(0) = A*(e) and 



A(e,u) >r+ (e,u)-A*(e) > 



sup h{9) 

8>0 



A*(e) 



> h(0)- A*(e) = 0. 
Again it suffices to show h'(0) > 0, where 

h'(0) = -e- A ^E[U(X)e e * F W] - u 

= -9* e - A ^E[F(X)U{X)e §F W] - u, 
with 9 = 9(X) £ (0,0*). Then, 

h'(0) > -0*e^ A(r) sup 5 (0)-n>-0*e" A(r) 7/2-n, 

6»6(0,6»*) 

which is strictly positive, as long as u < u$ = uo(e), with the same ito as 
before. □ 
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Proof of Theorem 4.2. Assume first that 7 > 0. Following the deriva- 
tion of (4.4) in the proof of Theorem 4.1, we have that for any < e < F 

and any u,(f)> 0, 



(4.5) 



A(e,n) > —<jm - A H 



+ A(6* 



At this point, most of the required work has been done. What remains is 
to write the above expression as a second-order Taylor expansion around 
e = 0, so that, with u = 7e/4 and 4> = 7E, the right-hand side of (4.5) is 
approximately bounded below by 



4 



2 



1 2 

+ 2 £ 
=0 1 



> 



7 2 e 2 



e=0 



We proceed to make this approximation rigorous. Let 77 := 7 2 /10 > in 
parts (iii) and (iv) of Lemma 4.1, and choose and fix a 6 £ (0, 77) smaller than 
the resulting r in part (iv). Since A" (8) is continuous and A"(0) = 1, we can 
choose (9 > small enough so that |A"(0) - 1| < 5 A" (9) for all < 9 < 9 . 
Let £0 be the minimum of A'(9q) and the two quantities e in parts (iii) 
and (iv) of the lemma. Then 8*(e) < 9q for all < e < £0, and moreover, 
for some 9 < 9* < 9 , so that 



9*(e) 
(4.6) 



1 



Now for any e < £q, let u = 7e/4 and 
that A(9*) is nondecreasing in 9*, 



A( £ , 7e /4)> 



> 



7 2 £ 2 



7 2 £ 2 



< 5 < t for all < e < e - 

7E in (4.5); using (4.6) and noting 

A+(9*(e), 1£ ) + A((l-6)e) 



2^.2 



-(1 - r + v)e 2 + -(1 - r/)(l - 5) 2 e 



-[ 7 2 + 2(1 -rj)5 2 -4(1 - 77)^-77] 



>-h 2 
— 4 



577] > 



7 2 e 2 



where the second inequality follows from parts (iii) and (iv) of Lemma 4.1 
with (1 — 5) in place of S, [3 = 7, and t = 9*{e)/e. 

Finally, the same result holds in the case 7 < 0, either by considering —U 
in place of U, or by replacing A^_ by in the above argument, as in the 
proof of Theorem 4.1. □ 
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5. Proofs of Theorems 2.1, 2.2 and 2.3. We begin with a simple, general 
upper bound in the spirit of the results in [3]. 

Lemma 5.1. Let F\,F2, . . . ,F m be an arbitrary (finite) collection of mea- 
surable functions from R to R. For any constants c\, C2, ■ • ■ , c m we have 

logPrj J2Fj(Xi) >ncj for all j = 1,2,..., ml <-n inf H(Q\\P), 

where E m is the set of all probability measures Q on R such that f Fj dQ > cj 
for all j = 1, 2, . . . ,m. 



Proof. Let A denote the event of interest in the lemma, and assume 
without loss of generality that it has nonzero probability. Write Pa for the 
probability measure on R n obtained by conditioning the product measure 
P n on A, and note that, by definition, 

-logPr(A) = -logP n (^) = H(P A \\P n ). 

Expressing Pa as the product of the conditional measures Pa,%(-\xi, . . . , sci_i) 
for i = 1,2, . . . ,n, we can expand the logarithm inside the relative entropy 
to obtain 

n 

- logPr(,4) = £ E[H(P A ,i(:\Y u . . . ,^-i)||P)], 

i=l 

where the random variables Y\,Y2, . . . ,Y n have joint distribution given by 
the measure P A - Using the fact that relative entropy is convex in its first 
argument (see, e.g., [4], Chapter 6), Jensen's inequality gives 

n 

-\ogVx{A)>Y,H{Qi\\P), 

i=l 

where Qi denotes the ith marginal of Pa on R. Using convexity again, 

n 1 

- logPr(^) > n]T -H(Qi\\P) > nH(Q\\P), 
i=i 

where Q = \ Ya=i Qi- To complete the proof it suffices to show that Q € E m . 
Indeed, for any j = 1, 2, . . . , m, 



i=i 



i n 



n . 
i=i 



>Cj, 



where the last inequality holds since the joint distribution of the {Y{\ is Pa, 
which is entirely supported on A by definition. □ 
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Next we give the proof of Theorem 2.2. The first upper bound follows 
from Lemma 5.1, the second is derived using the classical Chernoff bound, 
and the positivity of the exponent comes from the domination assumption 

m((3) < oo. 

Proof of Theorem 2.2. Throughout, we assume, without loss of gen- 
erality, that fx = v = 0. For part (i), taking F\ = F, F2 = —U, c\ = e and 
C2 = —u, Lemma 5.1 immediately yields the required bound. Part (ii) fol- 
lows by the usual Chernoff argument: For any pair of 0i,02 > 0, 

PrjS'n > ne, T n < nu} < Pr{,S n > ne, T n < nu} 

= E [ l {S n >ne} l {T n <nu}] 

< E[exp{9i(S n - ne)} exp{-6 2 (T n - nu)}] 

= exp{-n[0i£ - 9 2 u - A+(0i, 2 )]}- 

The stated result is obtained upon taking the supremum over all 61,62 > 
in the exponent. 

Finally, for part (iii) choose and fix an arbitrary a G (0, 1). Taking 02 = 
ae6\ju in the definition of A5_(e, u) yields 

(5.1) A-* + (e,u) >sup[0(l-a)e-A o (0)], 

6»>0 

where A o (0) := A + (0, ^) < 00 for all > because m(J3) < 00 for all (3 > 0. 
Now for any > 0, let Xg be a random variable whose distribution has 
Radon-Nikodym derivative with respect to that of X given by the density 

exp{0[F(x) - ((ae)/u)U(x)}} 
" [X) E[exp{6[F(X) - {(ae)/u)U(X)]}] ' 

so that go = 1 and Xq = X . Obviously Aq(0) = 0, and simple calculus shows 
that A' o (0) = E[F(X e ) - ^U(X e )] so that A' (0) = 0; the dominated conver- 
gence theorem justifies the differentiation under the integral, and also shows 
that Aq(0) is continuous in for all > 0, since F{X) and U (X) have finite 
first moments and m(/3) < 00 for all j3 > 0. 
Pick 0o > small enough so that 

sup{A o (0) :0 e [O,0 O ]} < A'o(0) + ii_^l£. 

Restricting the range of the supremum in (5.1) to [O,0o] yields 

0(1 -a)e o (l-a)e 
A+(e,n)> sup = , 

0<6K6» 1 1 

which is strictly positive. □ 
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The main technical step in the following proof is the (asymptotic) large 
deviations lower bound; it is established by a change-of-measure argument 
combined with regularization of the random variables of interest, as in 
Cramer's theorem. The main difference from the classical case is that, here, 
the domination assumption m((3) < oo replaces the usual condition on the 
existence of exponential moments in a neighborhood of the origin. 

Proof of Theorem 2.3. As above, we assume without loss of gener- 
ality that a = v = 0. Write 6 for an arbitrary pair of nonnegative (61,62), 
and write G : R -> R 2 for the function G{x) = (F(x),-U(x)), x G R, so that 
A+(0) = log£[exp{(0,G(A))}] and 

A* + (e,u) = sup[(e,(e,-u)) - A + (6)}, 
e 

where (•,•) denotes the usual Euclidean inner product. Note that, since 
m((5) < 00 for all (3 > 0, we have A+(0) < 00 as long as 62 > 0, and A + (0) = 0. 
Moreover, since E(G(X)) = 0, the dominated convergence theorem implies 
that A + (0) is differentiable, with 

(5.2) VA + (0) = E[G(X) ex P {<0, G(X)) - A + (0)}], 

for all 6 with 6 2 > 0. 

In view of Theorem 2.2(h), in order to establish the limiting relation (2.2), 
it suffices to prove the asymptotic lower bound, 

(5.3) liminf-logPr{5 n > ne,T n < nu\ > -A*,(e,u). 

n—*oo ti ~ 

To that end, consider three cases. First, if A^(e,u) = 00, (5.3) is trivially 
true. Second, assume that A+(e,u) < 00 and there exists 6 such that 

(5.4) E[G(X) exp{<0, G(X)) - A+(0)}] = (e, -u). 

Fixing this 6, define a new sequence of i.i.d. random variables X' , X[,X 2 , ■ ■ ■ 
with common distribution P' , where 

dP' 

— {x) = exp{(0, G(z)) - A+(0)}, x G R. 

Write S' n and T' n for the corresponding partial sums, and choose and fix 
5 > 0; then, ^ logPr{5 n > ne, T n < nu] is bounded below by 

-logPrjne < S n < n(e + 5),n(u - 5) < T n < nu} 
n 



-logS 

n 



(5.5) 



- dP 

\\-^p{X'i)l{ne<S' n <n{e+&)}\n{u-8)<T^<nu} 

.4 = 1 



= A+(6>) - (6>, (e, -u)) + i \ogE[e- e ^- n£)+e2 ^- nu h Bn ] 
> A+(6>) - (6>, (e, -u)) - {6 l + 6 2 )5 + ^ logPr(fi re ), 
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where B n denotes the event B n := {ne < S' n < n(e + 5)} n {n{u — 5) <T' n < 
nu}, and the last inequality follows from the observation that the expo- 
nential inside the expectation is bounded below by exp{— 9\n5 — 62nd} on 
B n . Note that our assumption (5.4) implies that E[G(X')] = (e, — it), and 
since m{(3) < oo for all (3, F(X') and U(X') have finite second moments. 
Therefore, from the central limit theorem we obtain 

liminf — logPr(S n ) =0, 

n— >oo 77, 

as long as 5 > is fixed. Noting also that A + (0) — (0, (e, — u)) > — A* + (e,u), 
taking n — > oo in (5.5) we obtain 

(5.6) liminf -logPrlS^ > ne,T n < nu} > -A*_(e,u) - (6>i + 9 2 )S, 

n — too 77, ~ 

for each 5 > 0, and taking 5 [ in the above right-hand side yields (5.3). 

The third and last case is when A^_(e,n) < oo but there is no 9 such that 
(5.4) is satisfied. We will repeat the above argument, but instead of the 
sequence {G(X n )} we will consider the new i.i.d. sequence {H(X n )} which 
is obtained by adding to the {G(X n )} i.i.d. Gaussians with small mean and 
variance. Specifically, choose and fix arbitrary 5 > and t > 0, and let 

H(X n ):=G(X n ) + tZ n +(^y n>l, 

where the {Z n } are i.i.d. with each Z„ consisting of two independent stan- 
dard Gaussian components, independent of the Let 

At(0):=log£[exp{(0,H(X))}], 

and note that 

(5.7) A t (0) = A+(0) + t 2 (6j + e 2 2 )/2 + 5{6 1 + 2 )/2 > A+(0) > 0, 

where the last inequality follows by applying Jensen's inequality to the log- 
arithm in the definition of A + (0) and recalling that G(X) has zero mean. 
Consequently, 

(5.8) A t *(e,?j) :=sup[(0,(e,-n)) -A t (0)] < A+(e,«) < oo. 

e 

From (5.7) and (5.8) it follows that, for any given 0, the function 

L(9) := (0, (e, -«)) - A t (0) < A;(e, u) - 1 2 {0\ + 6 2 2 )/2 - 5(9 1 + 6 2 )/2 

has sup . Q 1+do>R L(6) — > —oo as R — > oo. Moreover, in view of (5.2), L(B) 
is differentiable, and therefore the supremum in the definition of Aj (e, u) is 
achieved for some finite which satisfies the analog of (5.4), that is, with H 
and A 4 (0) in place of G and A + (0), respectively. So we can conclude from 
the previous argument that the lower bound (5.3) holds with H in place of 
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G. In fact, for the specific value of 5 > we chose in the definition of H, the 
same argument used to establish (5.5) and then (5.6) yields the following 
asymptotic lower bound: 

If n5 
liminf-logPr^ ne < S n + tJnW + — < nie + 5), 

n~*oo n I 2 

n(u -5) <T n + tyfnV - ^- < nu j 

(5.9) >-A* t (e,u)-(e 1 + 2 )8 

>-A* + (e,u)-(9 l+ 9 2 )5 
> — oo, 

where W, V are independent standard Gaussian random variables indepen- 
dent of the {^n}- On the other hand, a simple union bound gives 

ne<S n + ty/riW + y < n(e + 5), 

n(u — 5) <T n + t\fnV — — < mi| 
< Pr{ne < S n < n(s + 25),n(u - 25) <T n < nu} 



(5.10) 



( ,U) I log P r { W ^,|,|^} £ -| 



where the last probability is easily bounded as 

9_ 

At 2 ' 

Combining the bounds (5.9), (5.10) and (5.11) yields 
-A* + (e,u)-(0 1 + 9 2 )5 

"4t 2 ' 



< max 



lim inf - logPrjne < S n < n(e + 26), n(u - 25) < T n < nu}\. 

n->oo n J 

Letting t J. implies that 

liminf-logPr{S n > ne,T n < nu} > -A*,(e,u) - (6>i + 9 2 )5, 

n—*oo n 

and letting 6 } establishes (5.3) and thus completes the proof of (2.2). 

Finally, in order to show that the two rate functions are identical, it suf- 
fices to show that Al(e,u) is no greater than the entropy H(E\\Q), since the 
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reverse inequality follows from the upper bound in Theorem 2.2(i) combined 
with the asymptotic relation (2.2) we just established. Indeed, for arbitrary 
01) 02 > and any QgE, 

9 x e - 6 2 u - log E[exp{9 1 F(X) - 6 2 U(X)}] 

r dP 

= 9 x e - 9 2 u - log J dQ{x) — {x) exp{6 1 F{x) - 9 2 U(x)} 

dP 



<9 1 e-9 2 u- J dQ(x)log 
FdQ 



dQ 



(x)exp{9 1 F(x)-9 2 U(x)} 



UdQ 



+ H(Q\\P) 



<H(Q\\P), 



where the first inequality is simply Jensen's inequality and the second follows 
from the assumption that Q £ E. Taking the supremum of both sides over 
all 6i,9 2 > and then the infimum over all Q € E establishes the inequality 
u) < H(E\\P) and completes the proof. □ 

It is now a simple matter to deduce Theorem 2.1 from Theorems 2.2 
and 2.3. 

Proof of Theorem 2.1. Again we assume without loss of generality 
that fi = v = 0. For part (i), since 

E[ e dF ( x )] i s infinite for all 9 > 0, it is well 

known that 



(5.12) 



lim — log Pr\S n > ne\ = 

ti^oo n 



see, for example, [5], Example 9.8, page 78. To see that H(T,\\P) := 
infQes H{Q\\P) = note that, from Lemma 5.1, we have logPr{S' ri > ne} < 
-nH{T\\P). This combined with (5.12) implies that H(T,\\P) = 0. The limit 
in part (ii) is an immediate consequence of Theorem 2.3, and the fact that 
the exponent is strictly nonzero follows from Theorem 2.2(iii) and the iden- 
tification of the rate function as the entropy given in Theorem 2.3. □ 
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